Global distribution, climatic preferences and photosynthesis‐related traits of C4 eudicots and how they differ from those of C4 grasses

Abstract C₄ is one of three known photosynthetic processes of carbon fixation in flowering plants. It evolved independently more than 61 times in multiple angiosperm lineages and consists of a series of anatomical and biochemical modifications to the ancestral C3 pathway increasing plant productivity under warm and light‐rich conditions. The C4 lineages of eudicots belong to seven orders and 15 families, are phylogenetically less constrained than those of monocots and entail an enormous structural and ecological diversity. Eudicot C4 lineages likely evolved the C4 syndrome along different evolutionary paths. Therefore, a better understanding of this diversity is key to understanding the evolution of this complex trait as a whole. By compiling 1207 recognised C4 eudicots species described in the literature and presenting trait data among these species, we identify global centres of species richness and of high phylogenetic diversity. Furthermore, we discuss climatic preferences in the context of plant functional traits. We identify two hotspots of C4 eudicot diversity: arid regions of Mexico/Southern United States and Australia, which show a similarly high number of different C4 eudicot genera but differ in the number of C4 lineages that evolved in situ. Further eudicot C4 hotspots with many different families and genera are in South Africa, West Africa, Patagonia, Central Asia and the Mediterranean. In general, C4 eudicots are diverse in deserts and xeric shrublands, tropical and subtropical grasslands, savannas and shrublands. We found C4 eudicots to occur in areas with less annual precipitation than C4 grasses which can be explained by frequently associated adaptations to drought stress such as among others succulence and salt tolerance. The data indicate that C4 eudicot lineages utilising the NAD‐ME decarboxylating enzyme grow in drier areas than those using the NADP‐ME decarboxylating enzyme indicating biochemical restrictions of the later system in higher temperatures. We conclude that in most eudicot lineages, C4 evolved in ancestrally already drought‐adapted clades and enabled these to further spread in these habitats and colonise even drier areas.

less constrained than those of monocots and entail an enormous structural and ecological diversity.Eudicot C 4 lineages likely evolved the C 4 syndrome along different evolutionary paths.Therefore, a better understanding of this diversity is key to understanding the evolution of this complex trait as a whole.By compiling 1207 recognised C 4 eudicots species described in the literature and presenting trait data among these species, we identify global centres of species richness and of high phylogenetic diversity.Furthermore, we discuss climatic preferences in the context of plant functional traits.We identify two hotspots of C 4 eudicot diversity: arid regions of Mexico/ Southern United States and Australia, which show a similarly high number of different C 4 eudicot genera but differ in the number of C 4 lineages that evolved in situ.Further eudicot C 4 hotspots with many different families and genera are in South Africa, West Africa, Patagonia, Central Asia and the Mediterranean.In general, C 4 eudicots are diverse in deserts and xeric shrublands, tropical and subtropical grasslands, savannas and shrublands.We found C 4 eudicots to occur in areas with less annual precipitation than C 4 grasses which can be explained by frequently associated adaptations to drought stress such as among others succulence and salt tolerance.The data indicate that C 4 eudicot lineages utilising the NAD-ME decarboxylating enzyme grow in drier areas than those using the NADP-ME decarboxylating enzyme indicating biochemical restrictions of the later system in higher temperatures.We conclude that in most eudicot lineages, C 4 evolved in ancestrally already drought-adapted clades and enabled these to further spread in these habitats and colonise even drier areas.

| INTRODUC TI ON
By the early 1950s, it was widely assumed that all plants use the same C 3 carbon fixation pathway, the Calvin-Benson-Bassham cycle (CBB-cycle; Bassham et al., 1950).Shortly after a brief note on the discovery of a four-carbon CO 2 fixation pathway in sugarcane -now known as C 4 photosynthesis -published in the 1954 Annual Report of the Hawaiian Sugar Planters Association Experiment Station (Burr et al., 1957;Hatch, 2005), researchers set out to investigate this unexplored photosynthetic pathway.They found that the CBB cycle and RuBisCO were restricted to bundle sheath cells (BSC) and that in the mesophyll cells (MC) an auxiliary carbon fixing pathway with phosphoenolpyruvate carboxylase (PEPC) as the key enzyme generated C 4 molecules that are transported into the BSC and fuel the CBB-cycle.Since its discovery, understanding C 4 photosynthesis has become a vibrant research discipline, integrating the fields of biochemistry, physiology, organismic biology, ecology and evolution (Langdale, 2011).Evolving knowledge about the C 4 pathways has been published in various reviews (see Furbank and Kelly, 2021;Niklaus and Kelly, 2019;Sage et al., 2018;Schlüter and Weber, 2020 for four recent ones on different aspects of C 4 photosynthesis) and special issues (e.g., JXB special issue: C 4 Photosynthesis -50 years of discovery and innovation -Von Caemmerer et al., 2017).
C 4 photosynthesis evolved in at least 18 angiosperm families and more than 60 times independently (Sage, 2017;Sage et al., 2018; Figure 1a).Around 80% of the C 4 species are found in the Poales with 5044 C 4 species in Poaceae and 1322 C 4 species in Cyperaceae.
With two C 4 species in Hydrocharitaceae (Alismatales), this adds up to 6368 C 4 species in 339 genera in monocots, opposed to around 1777 eudicot C 4 species in 79 genera (Sage, 2017 and ref. therein) (Figure 1b).Interestingly, the eudicot C 4 lineages are phylogenetically more equally distributed and occur in three rosid and four asterid families belonging to six different orders.Nevertheless, the C 4 species-richest eudicot clades are restricted mostly to eight families of the Caryophyllales (Figure 1a, based on Sage, 2017: table 3).Sage (2017) the second number represents the number of C 4 species with verified information about C 4 photosynthesis performance (Table S1) the number in bold shows the remaining number of C 4 species after various data cleaning steps (Table 1).Proportions shown in the diagrams and mentioned in the text were calculated according to species numbers in Sage (2017) but do not change much when based on the other two reduced numbers.
C 4 eudicot family, followed by Euphorbiaceae (Malpighiales, rosids), Asteraceae and Boraginaceae (Asterales and Lamiales, respectively, asterids; Figure 1a).Although C 4 photosynthesis in eudicots is phylogenetically more widespread and ecologically and structurally more diverse than in monocots (e.g., Muhaidat et al., 2007;Rudov et al., 2020), the latter have received more attention (mainly in the Poaceae).This is partly due to C 4 grasses, such as maize and sugarcane, being initial model species of C 4 research, thus making their close relatives the focus of C 4 research even today (Hatch, 2005).
Furthermore, research focus on grasses can be attributed to their great economic and ecological importance (Linder et al., 2018).
However, understanding the diversity of the C 4 syndrome in eudicots is key to understand the evolution of this complex trait (Heyduk et al., 2019) because C 4 eudicot lineages evolved the C 4 syndromes along different evolutionary paths (e.g., Bohley et al., 2015;Kadereit et al., 2012;Lauterbach et al., 2019).
C 4 photosynthesis, which includes an auxiliary pathway to reduce photorespiration, likely arose in hot, dry and/or saline regions where C 3 photosynthesis performance is reduced (Sage et al., 2018).This is achieved by the fixative enzyme, phosphoenolpyruvate-carboxylase (PEPC), and by generating a local high CO 2 concentration around the key enzyme RuBisCO, to reduce its oxygenase activity (Sage et al., 2012).C 4 photosynthesis is usually associated with warm habitats with high evapotranspiration.Yet, the distribution of C 4 plants cannot be explained entirely by individual environmental factors (Christin & Osborne, 2014) because C 4 species occur in a variety of habitats, for example on nutrient-poor or fertile soils, in the tropics, in deserts or in the boreal zone, on open grasslands or forest undergrowth (Collins & Jones, 1986;Mahdavi & Bergmeier, 2018;Rudov et al., 2020).This diversity results from the multifaceted evolutionary history of the C 4 pathway (Christin & Osborne, 2014;Sage et al., 2011Sage et al., , 2018)).However, studies linking the evolution of adaptive traits and ecological niches in C 4 lineages are still insufficient.Lundgren et al. (2015) for example showed that C 4 photosynthesis does not initially lead to a shift of the ancestral niche in Alloteropsis semialata J. Presl (Poaceae), but rather expands its niche to cover a wider range of conditions that include the ancestral ones.This improves the success of occasional long-range dispersal events and thus increases the geographical range (Lundgren et al., 2015).
As C 4 photosynthesis is a complex syndrome that increases the efficiency of plants to use available water and nitrogen, C 4 might be advantageous under various environmental conditions, but evolved predominantly in the tropics and subtropics (Griffiths et al., 2013;Sage et al., 2018).

| Global expansion of C 4 grasses
C 4 plants account for one-quarter of the earth's primary terrestrial production, and almost a quarter of the Earth's surface is dominated by C 4 grasslands and savannas (Barbehenn et al., 2004;Grace et al., 2006;Sage et al., 2018).C 4 grasses likely intruded into C 3 grasslands and forests from open biomes of warm regions, subsequently replaced them during the late Miocene to the Pliocene (3-8 Mya) and expanded worldwide into drier biomes (Edwards & Smith, 2010;Ehleringer et al., 1997;Osborne & Freckleton, 2009).
However, molecular research suggested that C 4 photosynthesis in several grass lineages evolved earlier, around 18-30 Mya (mid/ late-Oligocene) presumably in warm, arid locations where water limitation was the main selective force to increase photorespiration (Christin et al., 2011;Zhou et al., 2018).As the atmospheric CO 2 in the late Miocene fell below ~300 ppm (Royer, 2006), a small number of hyperdominant C 4 grass species that were able to outcompete C 3 and C 4 relatives became dominant due to their advantage of a low CO 2 compensation point (Christin & Osborne, 2014;Lehmann et al., 2019).South America seems to be the major hotspot for the origin of C 4 grasses (Sage et al., 2011), and today, C 4 grasses are confined mostly to tropical and subtropical areas (Shoko et al., 2016;Woodward et al., 2004;Woodward & Lomas, 2004).Climatic patterns and the distribution of C 4 grasses in North America suggest that high minimum temperatures during the growing season favour C 4 grasses at regional scale (Teeri & Stowe, 1976).Yet, at the local scale, topographic and edaphic variables may exert more influence (Yan & de Beurs, 2016).

| Evolutionary and ecological diversity of C 4 photosynthesis in eudicots
Individual C 4 lineages had originated independently from the Oligocene into the Quaternary (Christin et al., 2011;Niklaus & Kelly, 2019).In Amaranthaceae s.l., which includes the largest number of C 4 lineages in eudicots, the earliest assumed origins of C 4 date back to the Oligocene and are roughly as old as the oldest C 4 grass subfamily Chloridoideae, which evolved around 32-25 Mya.This implies that C 4 eudicots are per se not younger than C 4 monocots (Christin et al., 2008(Christin et al., , 2011;;Kadereit et al., 2012).However, many C 4 lineages within eudicots (as well as in the monocots) originated in the Miocene when the climate got increasingly drier (Kadereit et al., 2003(Kadereit et al., , 2010)).In addition, there are many evolutionary young C 4 lineages in eudicots, for instance in Flaveria (Asteraceae), Sesuvium (Aizoaceae) and Tecticornia (Chenopodiaceae s.s.) that arose approximately 5-1 Mya (Christin et al., 2011;Kadereit et al., 2012;Sage et al., 2012).Since the range size of C 4 lineages as well as the physiological refinement of the C 4 syndrome is highly dependent on time, the age of the respective C 4 lineage needs to be taken into account when lineages are compared to each other (Niklaus & Kelly, 2019).
Although South America seems to be the hotspot of origins of the nowadays cosmopolitan C 4 grasses, six geographic regions were highlighted as potential ancestral areas for C 4 eudicot lineages.For most C 4 eudicot lineages Central Asia, North America, South Africa, northeast Africa and Arabia count as centres of origin (Kadereit & Freitag, 2011;Sage, 2016;Sage et al., 2011), an assessment based mainly on current distribution that still awaits the review of detailed phylogenetic, biogeographical studies in the individual eudicot lineages (e.g., Lauterbach et al., 2019).Due to the diverse nature of C 4 eudicots, no list of the global distribution of C 4 eudicots has been compiled thus far.
Most succulent C 4 species tolerate elevated salinity, suggesting that their succulence is primarily an evolutionary response to (physiological) drought.While in grasses a repeated gain and loss of salt tolerance throughout the history of the family prevails and halophytic grass species are isolated at the tips of the phylogeny (Bromham & Bennett, 2014), there are multiple evolutionary older halophytic lineages among eudicots that additionally acquired C 4 photosynthesis.This is particularly the case for C 4 lineages of Amaranthaceae (Kadereit et al., 2012(Kadereit et al., , 2017;;Piirainen et al., 2017) but also for Gisekiaceae (Bissinger et al., 2014), Sesuvioideae-Aizoaceae (Bohley et al., 2015) and Euphorbiaceae (Ghazanfar et al., 2014;Rudov et al., 2020).Some eudicot lineages acquired even further alternative carbon fixation pathways.The widespread succulent annual Portulaca oleracea L. is a halophytic C 4 species that is able to conduct both C 4 and CAM photosynthesis depending on the environmental conditions (Ferrari et al., 2020(Ferrari et al., , 2022)).
Despite anatomical and ecological differences, the biochemical forms that exist in C 4 photosynthesis are similar in grasses and eudicots.There are three biochemical subtypes in both grasses and eudicots, which are usually constant in a C 4 lineage, but may vary within and between plant families: NADP-malic enzyme (ME; e.g., in Caryophyllaceae; Sage et al., 2011), NAD-ME (e.g., C 4 species in Boraginaceae, Cleomaceae; Muhaidat et al., 2007), and the third further decarboxylating enzyme, PEP-CK that is more common in C 4 monocots (Wang et al., 2014).Due to the high number of fast growing and highly productive C 4 grasses many of which are interesting biofuel crops and phytoremediation plants (such as Miscanthus; Pidlisnyuk et al., 2014), one might assume that C 4 grasses are more competitive than C 4 eudicots given the right growing conditions.However, interestingly the species with the fastest CO 2 assimilation rates of 80 μmol m −2 s −1 at 325 μmol mol −1 is not a grass species but Amaranthus palmeri S. Watson (Amaranthaceae, Ehleringer, 1983;Sage, 2017).

| Scope and aims
C 4 grasses entail the majority of C 4 species and dominate in biomass production, yet the anatomical, physiological and ecological diversity of C 4 syndromes seems larger in C 4 eudicots.While shifts to C 4 physiology in grasses probably represent a pre-adaptation to open and arid subtropical habitats (Edwards & Smith, 2010;Osborne & Freckleton, 2009), the evolution of the C 4 pathway in eudicots, e.g., Amaranthaceae s.l.(incl.Chenopodiaceae), Nyctaginaceae and Sesuvioideae, is more likely a post-adaptation to the selection pressure in dry, saline and coastal environments that enabled survival in these habitats (Bohley et al., 2015;Kadereit et al., 2012;Khoshravesh et al., 2020).Already, Stowe and Teeri (1978) suggested that C 4 eudicots do not follow the climate preferences that have been reported for C 4 grasses and therefore might have followed a different evolutionary pathway to C 4 .
In this study, we aimed to characterise the global occurrence of C 4 eudicots, identify diversity hotspots and climatic preferences, and assign these to functional traits such as succulence, salt tolerance, biochemical subtype and anatomical leaf type.We hypothesised that the phylogenetic and structural diversity of C 4 eudicots is reflected in their colonisation of a wide range of climatic regions and environments and that the combination of C 4 photosynthesis with other traits enabled C 4 eudicots to invade areas not or less frequently colonised by C 4 grasses.To test this hypothesis, we compare the biogeographic patterns found in eudicots with those of the more species-rich C 4 grasses.Moreover, we link the geographical origins of C 4 photosynthesis to the diversity hotspots discovered in those C 4 eudicot lineages examined within a phylogenetic framework.

| MATERIAL S AND ME THODS
We compiled an initial dataset of C 4 eudicots according to Sage (2017).This list consisted of 16 eudicot families with indication of lineages and the number of C 4 species per lineage (Table S1).In order to list each C 4 species, literature research was conducted.If trait data were available, we recorded leaf anatomy, biochemical subtypes (NAD-ME, NADP-ME), succulence, woodiness, salt tolerance and life form (perennial, annual) from floras, revisions, reports, databases and online sources (see sources in Table S1).To reduce the artificial increase in species numbers and distribution areas due to synonymisation, we cross-checked for synonyms using plant softh eworl donli ne.org (POWO, 2020).In grasses, there are over 60,000 published scientific names corresponding to approximately 11,313 accepted species (Clayton et al., 2002 onwards;Osborne et al., 2014).Our list of C 4 eudicot species includes members of 15 families (Chenopodiaceae included in Amaranthaceae) with 1207 accepted species and a total of around 3969 synonyms.These 1207 species have verified information about C 4 photosynthesis performance and the literature and/or online resources provided detailed information about the traits discussed above.For comparison with C 4 grasses, we compiled a list of 309 genera following Osborne et al. (2014) (Table S1), as all species of these genera are assumed to perform C 4 photosynthesis.

| Data cleaning
Since georeferenced occurrence records from public datasets such as gbif.org are error-prone (Maldonado et al., 2015;Zizka et al., 2020), automated data cleaning of the C 4 eudicot and C 4 grasses coordinate datasets was performed with the "CoordinateCleaner" v2.0-18 package in R (Zizka et al., 2019) using the default options.Following the process outlined in Zizka et al. (2019), erroneous records within 1000 m of country and/ or province centroids and within 10,000 m of countries' capitals, within urban areas, records with locations as zeros, identical values, near GBIF headquarters, near biodiversity institutions and records on an ocean surface were removed.In addition to the "CoordinateCleaner," the dataset was manually checked for incorrect synonymisation relying on plant softh eworl donli ne.org as the taxonomic backbones of the GBIF are not always following the currently accepted taxonomic treatments by "The International Plant Names Index" (IPNI, 2020) and "World Checklist of Vascular Plants" (WCVP, Govaerts et al., 2021).Besides, duplicated coordinates, based on species name and coordinates, were removed.
Likewise, the taxonomic reliability using the distribution information of plant softh eworl donli ne.org was checked and occurrence points considered incorrect based on their distribution outside the native ranges of species were excluded.As a result of all cleaning steps, the number of coordinates was reduced from 1,012,557 to 247,205 occurrence points.
The cleaning of 2,296,101 occurrence records of C 4 grasses was also carried out with the "CoordinateCleaner" package.Additionally, the occurrence points outside the native ranges and duplicate coordinates per species were excluded.Manual cleaning of the incorrectly synonymised species was not carried out here, as we focussed on the genus level only.

| Analyses
We used 100 × 100 km grid cells to infer geographic patterns of C 4 species richness, with an equal area Behrmann projection.Species richness maps for each C 4 eudicot family for the uncleaned and cleaned GBIF dataset were generated using the package "species-geocodeR" v2.0-10 (Töpel et al., 2017;Figures S1-S15).Grids with species numbers were calculated using RichnessGrid.In addition to the individual species richness maps for each C 4 eudicot family, total species richness maps for C 4 eudicots and C 4 grasses were created with the cleaned datasets.These maps provide information on the total distribution (showing outstanding regions of C 4 species richness) of both groups.Grids showing richness above 50 species were defined to be C 4 species hotspots of high diversity.
Since data obtained from GBIF may be biased by unequal sampling in different areas (Hughes et al., 2021;Zizka et al., 2021), we additionally obtained data on species occurrences on the world geographic scheme for recording plant species Level-3 (TDWG, 2001) from the World Checklist of Vascular Plants (WCVP, Govaerts et al., 2021) as comparison.The WCVP contains a complete (to the best of current knowledge) list of all vascular plant species and hence is less biased by differences in sampling (but in exchange for lower spatial resolution and unequally sized sample areas; Antonelli et al., 2023;Schellenberger Costa et al., 2023).We obtained the distribution of C 4 eudicots from WCVP using the rWCVP package (Brown et al., 2023).First, we matched the names of our species list with WCVP using the wcvp_ match_names function, with subsequent manual resolution of multiple matches (Table S2) and then obtained all natural, present, not doubtful occurrences of these species.For C 4 grasses, we first matched the list of our genera with WCVP (Table S2) with sub-sequential manual resolution of all multiple matches and then obtained the natural, present and undoubtful distributions of all species in these genera.
To compare the importance of individual regions between C 4 eudicots and C 4 grasses, we calculated the rank difference for each region, by first ranking each region by the number of C 4 eudicot and C 4 grass species (most species = 1, fewest species = 354) with a mean tie-breaker (Figure 2g).We only included botanical countries with at least one C 4 grass and at least one C 4 eudicot.
Two bioclimatic variables (Bio1 -Annual Mean Temperature (°C*10); Bio12 -Annual Mean Precipitation (mm)) were extracted from WorldClim v.2 with a spatial resolution of 10 min (~340 km 2 ) (Fick & Hijmans, 2017).We plotted annual mean temperature and precipitation values using ggplot2 to compare the distribution of C 4 eudicots and C 4 grasses along these two climatic variables (Wickham et al., 2016).Since the focus here is mainly on comparing these two groups and there is probably no systematic difference in bias between them, the cleaned GBIF dataset was used for this analysis.
We are aware that this approach does not integrate biogeographical history which is beyond the scope of this paper.Likewise, boxplots for each family of C 4 eudicots, the C 4 grasses and all C 4 eudicots together were calculated in relation to the climate variables by first calculating the mean values of each species for each climate variable.Statistical analyses were conducted in R v4.0.2 (R Core Team, 2020) using RStudio v1.2.5042 (RStudio, Inc., 2009-2020) and R Commander (Fox, 2005).We used a Kruskal-Wallis test (for between C 4 eudicot families), followed by a post-hoc Tukey-Kramer test to determine where the differences were, with a p-value equal to or <.05 being considered statistically significant and a Mann-Whitney U test (between C 4 eudicots and C 4 grasses) to determine significance according to annual mean precipitation and annual mean temperature.A one-way ANOVA test was performed to determine whether eudicot C 4 species with NAD-ME as the primary decarboxylating enzyme are distributed in areas with significantly lower annual precipitation than eudicot C 4 species of the NADP-ME subtype.
To display the distribution areas of each C 4 eudicot family at the biome level (Olson et al., 2001), a table in Figure 4a was created.
It shows the number of family species within a biome (Terrestrial ecoregions of the world; sensu Olson et al., 2001).Only plant families with the highest number of species in that study according to Figure 1a were selected for this table.

| Literature survey
To place our findings on the diversity hotspots of C 4 eudicots in a spatiotemporal framework, we conducted a literature review of phylogenetic and biogeographical studies incorporating C 4 eudicot lineages in order to reveal the current understanding of the C 4 photosynthesis origin in these groups.The aim was to assess whether a lineage developed C 4 photosynthesis in a particular area (in situ origin) or developed C 4 before colonising a particular area (ex situ origin).

| The impact of data cleaning
We included only species for which direct evidence of C 4 photosynthesis (such as C 4 -like δ 13 C values or C 4 leaf anatomy) is documented in the literature.This was the case for 1207 species of the approximately 1777 C 4 eudicots according to Sage (2017).Since Sage (2017) estimates the number of C 4 species per lineage, our refined C 4 eudicot species list is substantially shorter (Table S1).For 208 of these 1207 C 4 eudicot species, no occurrence points were documented in GBIF.Therefore, the final list of C 4 eudicots analysed here included 999 species.
Performing the necessary cleaning steps reduced the raw C 4 eudicot occurrence points dataset, which originally contained more than 1 million records, to less than a quarter (Table 1).After the use of the "CoordinateCleaner" package, 280,935 C 4 eudicot records (27.75%) of 1,012,557 were removed (Table 1).A manual check for erroneous synonymisation removed 153,236 occurrence records from the remaining dataset.In the next step, 155,481 duplicate coordinates were removed.Notable is the additional reduction of 175,700 distribution points, after filtering out the outliers.After all these cleaning steps, we retained 963 species with a total of 247,205 occurrences.620 species were represented by more than or equal to 10 records, whereas 343 species were represented by <10 records.
Altogether, 75.59% of the occurrences for the C 4 eudicots were excluded (Table 1).
In the raw C 4 grasses dataset, which contained 2,296,101 distribution points for 271 C 4 grass genera (GBIF. org, 2020p, 2020q, 2021r, 2020s, 2020t), around 382,595 points (16.66%) were removed after applying the "CoordinateCleaner" package.Excluding The usability and consequently the sustainable success of large data repositories such as GBIF will thus in the future largely depend on the effort put into the curation of the data.Currently, these data should only be used with caution (Zizka et al., 2020), and a meaningful dataset can only be extracted via several filtering steps, as seen in this study.

| C 4 eudicot and C 4 grasses comparison
Species richness is a commonly used measure of biodiversity (Albrecht et al., 2021;Gould, 2000).Richness maps are used to explore patterns of richness and help to investigate the processes that shape those patterns.The species richness maps of C 4 grasses and C 4 eudicots show the generally higher species diversity of C 4 grasses (Figure 2a-d).
C 4 eudicots and C 4 grasses considered together, two regions stood out with a high C 4 species richness: Mexico/Southern United States and Australia (Figure 2a,c).Additionally, when considering species richness per botanical country, South America can also be identified as a region rich in C 4 plant species (Figure 2b,d  Since India is not divided into provinces like the other tropical countries, it stands out due to its size and the associated high number of species.Looking at the more detailed species richness map with a resolution of 100 × 100 km grid cells, India as a region does not seem to have a high C 4 species richness (Figure 2a,b).This emphasises the importance of considering different scales and resolutions when analysing ecological patterns.
The rank difference of C 4 eudicots versus C 4 grass species occurrence revealed the particular importance of deserts and xeric shrublands (e.g., Africa, Arabian Desert, Asia) and the temperate northern hemisphere (e.g., Mediterranean) for C 4 eudicots; and the Afrotropical realm (particular Madagascar) and the tropical zone of South America, Southeast Asia and Australia for C 4 grasses (Figure 2g).
Both, C 4 grasses as well as C 4 eudicots, occurred in a wide range of annual mean temperatures from 1 to 31.2°C and 6 to 30.5°C (95% interval), respectively (Figure 3).The median temperature was 19.0°C for C 4 grasses and 17.4°C for C 4 eudicots.C 4 grasses, in addition to increasing in the range of 15-18°C, had a second steep increase in occurrence records ranging between 27 and 30°C, dominated by the subfamilies Chloridoideae and Panicoideae.That last peak could not be observed in the C 4 eudicots.Occurrence points of C 4 grasses were found in a broad niche of annual mean precipitation profiles, from 0 to approx.2000 mm (95% interval).However, the predominant occurrence of C 4 grasses tended to be in the semi-wet areas.An increase in occurrence points was seen in the range between 600 and 900 mm, with the median of 772 mm.C 4 eudicots, on the other hand, occurred in distinctly less precipitation areas, with a median of 394 mm (Figure 3).An increase of C 4 eudicot occurrence points was observed in regions with approx.300-600 mm precipitation/ year (Figure 3).On a per-continent basis, the occurrence of C 4 grasses and C 4 eudicots differed most prominently in Europe and Africa (Figure S16).In Africa, C4 eudicots show a higher density in areas with <500 mm precipitation, especially in regions with cooler temperatures.In Europe, C 4 eudicots show a higher density in areas with <400 mm precipitation and warm temperature.
Overall, the diversity and abundance of C 4 plants increased with increasing annual mean temperature and dry season and decreased with increasing cold temperatures and rainfall.For C 4 grasses, there was a trend to wetter areas than in C 4 eudicots.Areas with cool and dry conditions are primarily colonised by C 4 eudicots.

| Diversity of C 4 eudicots
Mapping C 4 occurrence points at family level revealed many C 4 eudicots hotspots of high taxonomic diversity at higher ranks with C 4 species from greater than or equal to seven families occurring in the same area (Figure 4c).These hotspots were Mexico/Southern United States, Australia, South and West Africa, and South America.
In South America, the hotspot was located in the montane grasslands for annual mean temperature -p-value <2.2e-16).These results point towards a wide adaptation range to diverse environmental conditions among the families and both groups (Figure 5).Most C 4 eudicot species showed the classical atriplicoid leaf anatomy without or only little accompanied water storage tissue (Figure 6c).This anatomy with minor differences was also predominant in C 4 grasses (Edwards & Voznesenskaya, 2011).In cases of succulence, there was often a deviation from atriplicoid anatomy.

| Traits
A high diversity in the succulent leaf anatomy of C 4 eudicots was observed, with most C 4 leaf types occurring only in a few species (Figure 6).However, the salsoloid leaf anatomy was clearly the most common leaf type among succulent C 4 eudicots, not only in the Amaranthaceae but also in Aizoaceae and Polygonaceae (Table S1).
Among the succulent species, ca.3% were stem succulents, while the rest were leaf succulents.

| Results of the literature survey on geographical origin of C 4 photosynthesis
Seventeen genera out of eight C 4 eudicot families were distributed in the diversity hotspot of Mexico and Southern United States (Table 2).Five of these genera encompassed more than ten C 4 species found in the area, with six C 4 genera originated in the area.Out of these in situ C 4 genera two, Euphorbia and Pectis, are species-rich.
Sixteen genera out of ten C 4 eudicots families were distributed in the diversity hotspot Australia (Table 2).Two of these contained more than ten C 4 species found in the area, with only one species-poor C 4 lineage (C 4 Tecticornia) unequivocally originating in the area.Atriplex and Gomphrena were the species richest C 4 genera in Australia.While C 4 Australian Atriplex originated ex situ the area of origin of C 4 in Gomphrena is currently unknown (Table 2 and citation therein).Overall, in 14 cases the area of origin of C 4 photosynthesis is still insufficiently investigated (Table 2).

| DISCUSS ION
We characterised the global occurrence of C 4 eudicots, identified diversity hotspots and climatic preferences, assigned these to specific functional plant traits and conducted literature research on The table shows the number of species of a family within a biome (Terrestrial ecoregions of the world (Olson et al., 2001)).Species are considered present in a biome if at least 5% of the distribution points are in that biome.S1) includes the current knowledge of physiological and morphological traits underlying large-scale patterns for C 4 eudicots.Our approach combines the thus far known worldwide distribution of C 4 eudicots, the evolution of photosynthesis and associated traits and climatic preferences of the individual lineages.
So-called "big data," as in our case with many distribution points of many different species from GBIF, are valuable resources for wide-scale analyses and can provide novel insights.However, accurate and elaborate cleaning of the data is essential to obtain meaningful results (Zizka et al., 2019).Another challenge is that in a project with the scope presented here a verification of all the identifications is unrealistic if not close to impossible.Also, we have to take into consideration the sampling density bias of Europe, North America and Australia over large areas of poorly sampled areas of Africa and Asia and thus have to interpret our findings applying to these regions with caution, which is why we additionally obtained species richness maps with the use of WCVP.While WCVP data are less affected by sampling differences, it comes with its own limitations, such as lower spatial resolution and unevenly sized sample areas (e.g., India sticks out as particular species rich in C 4 eudicots, particularly due to its size).Despite these challenges, using data from both sources allows for a more comprehensive and cautious analysis, taking into account the strengths and limitations of each dataset.These facts additionally underline the general need of well-curated data in our biodiversity repositories and collecting efforts to fill the sampling gaps if we want them to be used by a broad community of researchers and to provide useful data for wide-scale analyses.(Cerling et al., 1997;Wen et al., 2023).

|
The global maps of C 4 species richness (Figure 2a-c In these regions, diversification within genera is less prominent, but multiple C 4 eudicot lineages evidently colonised these areas as well (Figures 2e,f and 4c).
Whether these regions of high diversity of C 4 lineages represent areas of C 4 origin or were just preferably colonised by already existing C 4 lineages or both need to be evaluated for each C 4 lineage and region in a phylogenetic and biogeographical context (see below).
Generally, both C 4 grasses and C 4 eudicots showed broad climatic ranges with a large overlap.On average C 4 grasses occurred in only slightly warmer but distinctly wetter areas than C 4 eudicots (Figure 3).The occurrence of C 4 grasses increases in regions with an annual rainfall of around 800 mm (Figure 3) and warm temperatures coupled with high insolation -conditions common in the southern hemisphere (Still et al., 2014).Where these climatic conditions are met C 4 grasses not only tend to show a high species diversity but often also dominate the vegetation, especially when fires occur regularly (Hoetzel et al., 2013;Sage, 2004;Still et al., 2014) receive an annual mean precipitation between 400 and 900 mm mainly during the warm summer months (Bond, 2008;Low & Rebelo, 1996;Mills & Cowling, 2006) (December-February) and the Australian monsoon bringing up to 1300 mm rainfall (Ondei et al., 2016).
C 4 eudicots colonise predominantly dry to very dry areas with 80% of the occurrences in areas with <800 mm precipitation.One prominent example of an area where C 4 eudicots show a higher diversity is the cold deserts of Eurasia, with temperatures below the freezing point for an extensive period of time throughout the year (Johnston, 1996;Rudov et al., 2020;Winter, 1981).These ther of these genera (Kürschner, 2004;Lauterbach et al., 2019;Wu et al., 1994Wu et al., -2013)).Less prominent C 4 floral elements of cold Central Asian deserts include species-poor genera such as Horaninovia, Iljinia, Nanophyton, Piptoptera, Pyankovia, Turania and Xylosalsola.These are together with Anabasis and Haloxylon all members of an evolutionary old C 4 lineage within Salsoloideae (Amaranthaceae) that likely spread into the cold desert areas several times independently (Akhani et al., 2007;Kadereit et al., 2012).The biogeographically most comprehensively studied genus among these is Anabasis, which revealed the adjacent hot deserts of the Irano-Turanian Provinces as source areas for the species occurring in the cold deserts of the Mongolian Province (Lauterbach et al., 2019).Another example of vegetation with high diversity of C 4 eudicots is the hot deserts of Central Australia where certain species of the large genus Atriplex such as A. holocarpa, A. lindleyi and A. vesicaria are highly abundant (Wilson, 1984).
In terms of preferred biomes, our analyses revealed that C 4 grasses are most common in (sub)tropical grasslands, savannas and shrublands, whereas the highest species diversity of C 4 eudicots was recorded from deserts and xeric shrublands (Figure 4).Expectedly, both C 4 groups are scarcer in water-rich and cooler biomes probably because the C 4 syndrome is less advantageous and C 3 species are more competitive.In regions where C 3 trees dominate, the C 4 syndrome might be a disadvantage due to the limitations of the higher ATP-demand of this pathway in shady habitats (Ehleringer & Björkman, 1977;Sage & McKown, 2006).Within C 4 eudicots, Amaranthaceae are ecologically the most diverse and are the only C 4 eudicot clade found at higher latitudes in boreal forests.2), supporting the findings of Sage et al. (2011) that Mexico and the Southern United States are a hotspot of C 4 lineage diversity.For at least six of these lineages, the current molecular phylogenies deliver sufficient evidence for an in situ origin of C 4 photosynthesis within this area, with two of these being species-rich (Table 2).

|
One is the neotropical genus Pectis (Tageteae, Asteraceae) which includes approximately 90 C 4 species and is represented by about 47 species in Mexico and the Southern United States (Hansen et al., 2016).The sister genus Porophyllum performs C 3 photosynthesis and is also distributed in tropical and subtropical America.Hansen et al. (2016) show that the transition to C 4 photosynthesis occurred most likely during the Late Miocene in the stem lineage of Pectis which was probably distributed in North/Central Mexico.
Within the mega-diverse family Euphorbiaceae the evolution of carbon concentrating mechanisms led to diversification bursts (Horn et al., 2014).Here we find the second example of in situ origin of C 4 photosynthesis in Mexico/Southern United States with subsequent diversification (Horn et al., 2014;Yang & Berry, 2011).The cies, mainly distributed in Southern North America, with few species occurring in the Caribbean and South America (Powell, 1978).
The C 4 genera of Nyctaginaceae, Boerhavia and Allionia, belong to the "North American xerophytic clade" of the family (Douglas & Manos, 2007;Khoshravesh et al., 2020).This clade likely diversified in the deserts of the southwestern United States and northwestern Mexico because all genera are either confined to or represented in the area (Douglas & Manos, 2007).Boerhavia subsequently spread and diversified in subtropical regions worldwide.The situation in the species-rich Euploca (Boraginaceae) is challenging to assess as the published molecular phylogeny lacks support along the backbone (Frohlich et al., 2022).Nevertheless, an in situ origin of the North American C 4 species seems likely.Amaranthaceae s.l. are well-represented in Mexico and the Southern United States with eight genera containing native C 4 species (Table 2), however, only for Tidestromia which consists entirely of C 4 species that are all but one endemic to the region, an in situ origin seems likely (Sánchez-del Pino & Motley, 2010).Froelichia, Guilleminea and Gomphrena belong to a species-rich and widespread C 4 clade that probably started to diversify during the Mid-Miocene (Limarino & Borsch, 2020).However, due to insufficient sampling, it is currently impossible to infer whether the C 4 pathway originated in tropical South America or in subtropical southern North America.Insufficient phylogenetic information also prevents us from inferring the origin of North American Amaranthus species (Waselkov et al., 2018).However, since the entire genus exhibits C 4 and probably originated in South America, Amaranthus seems to be a migratory C 4 lineage in Mexico and the Southern United States.
Other migratory C 4 lineages are Trianthema which spread into the area from Africa (Bohley et al., 2015), Atriplex which arrived from South America (Žerdoner Čalasan et al., 2022), Portulaca (Ocampo & Columbus, 2012;Tamboli et al., 2022) and Suaeda (Schütze et al., 2003).The biogeography of the C 4 genus Kallstroemia which is distributed from Central and Southern North America to tropical and subtropical South America remains unclear due to limited phylogenetic support (Lauterbach et al., 2019).
Nowadays, the C 4 hotspot in Mexico and the Southern United States receives a limited amount of rain, ranging from around 50-casted by the mountain ranges of Sierra Madre Occidental and Sierra Nevada, and Sierra Madre Oriental on either side, which are of Late Mesozoic and Early Cenozoic age (Dickinson, 2004).
The earliest evidence of desertification in this area dates to the Middle Miocene and corresponds with the diversification events of arid-adapted lineages (Eronen et al., 2012;Hyland et al., 2019;Said Gutiérrez-Ortega et al., 2018;Vásquez-Cruz & Sosa, 2020).
This refers also to some of the in situ originated C 4 lineages, such as Pectis, Flaveria (both Asteraceae) and Allionia and Boerhavia (both Nyctaginaceae) which have originated and spread since the Mid to Late Miocene (Table 2).We suggest that the overall high diversity of ancestral C 3 lineages in the area adapted to arid conditions in addition to the high selective pressure in favour of the evolution of a carbon concentration mechanism is responsible for the exceptionally high diversity of C 4 lineages that originated in Mexico and the Southern United States.In addition to these in situ C 4 lineages a high number of migratory C 4 lineages occur finding suitable growing conditions in the area.

| C 4 hotspot Australia
Within Australia, two regions of high C 4 plant diversity with different precipitation profiles are observed.The first one constitutes deserts and xeric shrublands of the Eremaean floristic region (sensu Ebach et al., 2015), rich in C 4 Atriplex (about 60 species) but also in C 3 Camphorosmeae (about 150 species) and Chenopodium (about 50 species; Kadereit et al., 2005).Most other Australian C 4 eudicots are restricted to the northern parts of the continent where tropical and subtropical grasslands, savannas and shrublands prevail.
Here the biggest C 4 genera are Gomphrena (Amaranthaceae) with 30 C 4 species, followed by Euphorbia (Euphorbiaceae) with seven and Portulaca (Portulacaceae) and Polycarpaea (Caryophyllaceae) with five C 4 species each.
While there are many different C 4 plant lineages known from these areas, the majority of them did not evolve in situ ( Aizoaceae (Trianthema and Zaleya) with Africa as their source area (Bohley et al., 2015).The spatial and temporal aspects of other C 4 representatives of Australian flora (Euploca, Glossocardia, Gomphrena, Polycarpaea, Portulaca and Tribulus) remain unclear.
Limited data, however, point towards ex situ evolution of the C 4 syndrome in these genera.
The only clear C 4 in situ origin is currently known from Tecticornia (Shepherd et al., 2005;Voznesenskaya et al., 2008).
This taxon is adapted to hyper-saline conditions and builds extensive vegetation stands along the edges of Australian inland salt lakes (Shepherd et al., 2004).This genus comprises about 60 species, out of which only a clade of five taxa is known to perform C 4 photosynthesis.While many C 3 species have rather restricted distribution areas (which may or may not be a result of lack of surveys in poorly accessible Australian outback), one of the two C 4 species, Tecticornia indica, exhibits a wide distribution range along saline lake shores across the whole continent (Wilson, 1984).
Another, albeit less clear example is the small Australian genus Tribulopis.Conflicting phylogenetic signals between the nuclear and chloroplast-encoded genes point towards a complex evolutionary history of this taxon (Lauterbach et al., 2019).Contrarily to Tecticornia, here the C 3 representatives show a wider distribution range, whereas the C 4 species tend to be geographically restricted (Wilson, 1984).While reasons for this peculiar distribution remain unknown, this example clearly indicates that the factors promoting the evolution of C 4 photosynthesis are multifold and that each individual C 4 lineage has its own unique evolutionary history.Both taxa arrived to Australia post-Miocene (Piirainen et al., 2017;Wu et al., 2018), which coincides with several geological and climate features that initiated and promoted aridification of this continent.
These include the northward drift towards the equator, expansion of the Antarctic ice cap, and the formation of the circum-Antarctic Ocean current and subtropical high-pressure system (Fujioka & Chappell, 2010;Kemp, 1978).

| Other C 4 eudicot hotspots
The scarce dataset revealed three regions in Africa that seem to favour C 4 eudicots.The first includes the tropical and subtropical grasslands, savannas and shrublands of Africa.The second C 4 -rich region is located in south-east Africa.Here we should mention that the Drakensberg Mountain Centre located in this region is a known biodiversity hotspot, which may or may not influence the number of C 4 species (Carbutt, 2019;Popp & Kalwij, 2021).The third region is in southwestern Africa (Fisher et al., 2015;Schulze & Schulze, 1976;Vogel & Seely, 1977).
In South America, two centres of C 4 eudicot biodiversity are Pliocene (Ocampo & Columbus, 2012), following the expansion of C 4 vegetation due to decreased CO 2 levels and increased aridity (Salzmann et al., 2008;Strömberg, 2011).Nevertheless, the north-eastern portion of Brazil became consistently arid very recently in geological history -at the end of Younger Dryas (Auler et al., 2004).This leads us to believe that the high C 4 diversity of Portulaca in that region is of refugial origin, as continuous wetter interglacial cycles prior to that diminished any advantage of C 4 species over their C 3 congeners.Poorly resolved phylogeny and lack of time divergence estimation preclude us from discussing potential stages in the evolutionary history of Euploca (Frohlich et al., 2022).For C 4 Atriplex lineages molecular phylogenetic studies show two long-distance dispersal events to reach South America -one possibly from continental Asia and one from North America (Žerdoner Čalasan et al., 2022).
It is important to mention that there are also regions, such as the Central Asian Deserts, that are not a main area of origin for C 4 lineages, but have a high C 4 lineage diversity due to a lot of migration.However, while the knowledge on the geological history of this region increased dramatically in recent years (Barbolini et al., 2020;Hurka et al., 2019), the evolutionary history of its flora remains largely unknown (Seidl et al., 2021;Žerdoner Čalasan et al., 2021).

| Functional traits lead to ecological diversity in C 4 eudicots
Precipitation and temperature preference among the C 4 eudicot families differ significantly (Figure 5).This supports the wellknown fact that C 4 is advantageous under different environmental conditions.Various phylogenetic analyses (cited in plants (Schlüter & Weber, 2020), it required a variety of complex developmental changes and thus possibly represent a bottleneck to the C 4 origin (Lauterbach et al., 2019).Hence, the high number of independent C 4 origins and their morphological, ecological and physiological diversity in eudicot lineages is unexpected.
Kranz anatomy is an important unifying trait for almost all C 4 species, and the majority of species show a similar (so-called atriplicoid) C 4 anatomy with Kranz cells (or bundle sheath cells) surrounding the vascular bundle and an outer ring of specialised mesophyll cells (Figure 6c).However, many eudicot C 4 lineages deviate from this common anatomical type and show additional anatomical specialisations related mostly to leaf or stem succulence.All lineages that combine succulence and C 4 photosynthesis seem to be derived from ancestrally succulent C 3 lineages (e.g., Tetraena simplex in Zygophyllaceae (Lauterbach et al., 2016) (Pinto et al., 2016;Schulze et al., 1996;Taub, 2000).Within C 4 eudicots the NAD-ME subtype seems more common than in C 4 grasses where the NADP-ME pathway prevails (Sage et al., 2011).This might also explain why C 4 eudicots managed to successfully occupy even the most arid regions around the globe, whereas C 4 grasses are on average found in regions with more rainfall (Figure 5a).

| A brief summarising family perspective of C 4 evolution in eudicots
The morphological, physiological and ecological diversity within C 4 eudicots is immense.The most diverse plant family is Amaranthaceae, in which C 4 photosynthesis developed several times independently in different environmental conditions (Kadereit et al., 2003(Kadereit et al., , 2012;;Sage et al., 2007) Voznesenskaya et al., 2002).Furthermore, the shift from C 3 photosynthesis in cotyledons to C 4 in adult leaves has been so far only reported from various species of Salsoleae (Lauterbach et al., 2017).Two species of Tecticornia represent the only known stem-succulent C 4 species with window cells in their mesophyll (Marchesini et al., 2014;Moir-Barnetson, 2014), and the Salsola divaricata agg.represents the first C 4 lineages that was shown to have arisen from ancestral hybridisation events between a C 4 and a C 3 lineage within Salsoleae (Morales-Briones & Kadereit, 2023).
Amaranthaceae are in terms of the C 4 species rich followed by Euphorbiaceae and Asteraceae (Figure 1).In the former predominantly tropical plant family, the C 4 syndrome evolved only once resulting in a clade of about 150 species found for the most part, but not exclusively, in seasonally dry and arid zones.In Hawaii, Euphorbia species occur in habitats ranging from arid coastal beaches to rainforests.One example is Euphorbia clusiifolia Hook.& Arn.(formerly E. forbesii Sherff, not accepted by Govaerts et al., 2000), a tree species that grows up to 13 m high and is endemic to the cool, mesic, subtropical forests in the geographically young Hawaiian Islands (Pearcy & Ehleringer, 1984).Further C 4 trees include Euphorbia olowaluana and E. remyi, which occur in dry open and subalpine, and humid forests, respectively (Young et al., 2020).These examples strongly indicate that the C 4 pathway is not limited to herbaceous or shrubby life forms but may also occur in trees, albeit these examples evolved woodiness secondarily from herbaceous ancestors on the island (Zizka et al., 2022).This link is particularly interesting since the secondary evolution of woodiness may in turn be linked to drought adaptation in other lineages (Dória et al., 2018;Hooft van Huysduynen et al., 2021).Many other C 4 Euphorbia species occur in the understory rainforests of Hawaii, where the precipitation is rather high with an average range of 1200 mm to 1800 mm annually (Pearcy & Ehleringer, 1984) -conditions, in which C 4 photosynthesis does not seem to have a physiological advantage.
In Asteraceae, the C 4 syndrome originated several times resulting in about 100 species that often sympatrically occur with the C 3 congeners in dry and arid habitats.Due to the presence of many C 3 -C 4 intermediate species, Flaveria became a model of C 4 evolution (Monson & Moore, 1989;Sage et al., 2014).The species of the genus Isostigma, native to Bolivia, Brazil and North Argentina, show Kranz anatomy in the stems and two different types of Kranz anatomy in leaves (Peter, 2009).The Eryngiophyllum type, common in hot and arid conditions, has one Kranz unit per leaf and sclerenchyma tissue, whereas the Isostigma type, more common in places with higher precipitation, shows more than one Kranz unit per leaf and no sclerenchyma tissue (Peter & Katinas, 2003).While the majority of Aizoaceae rely on the CAM carbon fixation pathway, there are about 30 species that perform C 4 photosynthesis.The details of anatomy and biochemistry vary greatly among closely related species with several origins of both NAD-ME and NADP-ME pathways (Bohley et al., 2015).The most widely distributed C 4 species in this group under current taxonomic treatments is Trianthema portulacastrum, which can carry out CAM and C 4 (Winter et al., 2021).While this physiological plasticity possibly facilitated its wide distribution, other widely distributed non-C 4 (but also weak CAM; Winter et al., 2019) species such as Sesuvium portulacastrum show that C 4 , albeit beneficial, cannot solely explain species' wide distribution.In monogeneric Portulacaceae about two-thirds of currently known species from the genus Portulaca exhibit C 4 anatomical features using NAD-ME and NADP-ME carbon fixation pathways in closely related lineages (Ocampo et al., 2013).Furthermore, many species also show reversible physiological signs of an effective CAM carbon fixation pathway under extreme drought stress (Holtum et al., 2017;Moreno-Villena et al., 2022).While this physiological plasticity may again explain the wide distribution range of Portulaca oleracea complex, its taxonomic uncertainty hampers our understanding of its evolutionary history and its true distribution extent (Ferrari et al., 2020).
While majority of the C 4 eudicots occur in regions with a moderate drought period, Polygonaceae, Scrophulariaceae and Zygophyllaceae have the highest C 4 species diversity in much drier arid and semi-arid regions.The shrubby species of Calligonum (Polygonaceae) possess only the NAD-ME mechanism for carbon fixation (more effective in drier regions), have a salsoloid Kranz type and are found only in cold and dry deserts of Eurasia and Africa (Sage et al., 2011;Sage, 2017;Pyankov et al., 2000Pyankov et al., , 2010;;Muhaidat et al., 2012;Figure S12b; Table S1).In Africa, it shares its distribution range with the only C 4 taxon from Scrophulariaceae, Anticharis, Amaranthaceae s.l.(Caryophyllales) is by far the most species-rich K E Y W O R D S biome, C 4 photosynthesis, climatic preferences, desert, GBIF, salt tolerance, succulence T A X O N O M Y C L A S S I F I C A T I O N Biodiversity ecology, Biogeography, Botany F I G U R E 1 Angiosperm families in eudicots (a) and monocots (b) including C 4 species.The first number represents the total number of C 4 species per family according to occurrence points outside the original distribution areas resulted in the elimination of 375,560 distribution points.Duplicate coordinates per species were removed, resulting in an additional 464,424 occurrence points being excluded.After all cleaning steps, approximately 53.25% distribution points were removed from the raw C 4 grasses dataset, leaving 1,073,522 distribution points.Manual cleanup of incorrectly synonymised species was not performed, as only the genus level was considered.Intermediate analyses with uncleaned or only partly cleaned data showed that these datasets would have led to different results (Figures S1-S15 illustrate this).The false occurrence data are prevalent to the extent that they blur any meaningful result of the clean data.

F
I G U R E 2 Global maps of total C 4 grasses and C 4 eudicots.A. Richness per 100 × 100 km grid cell.B. Richness per botanical country.(a) and (b) Species richness of C 4 grasses.(c) and (d) Species richness of C 4 eudicots.(e) and (f) Genus richness of C 4 eudicots.(g) Rank difference (botanical countries ranked by the number of respective species) C 4 grasses -C 4 eudicots.Blue indicates botanical countries more important for C 4 grasses, and red indicates botanical countries more important for C 4 eudicots.
).For the C 4 eudicots, the hotspot of a high diversity at generic and family rank was in Mexico and extended further north into the United States, where deserts and xeric shrublands prevail, and the Australian hotspot lied in the deserts and xeric shrublands of Central Australia, but extended also into the (sub)tropical region in the north and the Mediterranean region in the west (Figures2e,f and 4c).
C 4 grasses showed four diversity hotspots: (1) the tropical and subtropical open coniferous forests, as well as the adjacent deserts of thorn scrubs with fleshy plants and pastures at slightly higher elevations of Mexico, where temperate to semi-arid climate prevails; (2) the tropical and subtropical grasslands and shrublands of Queensland, Australia; (3) South America; and (4) Africa in tropical and subtropical grasslands, savannas and shrublands (Figure 2a,b).
and shrublands of Argentinian Patagonia, whereas the Australian hotspot at the family level expanded into the tropical and subtropical grasslands and shrublands.At the genus richness of C 4 eudicots, additional diversity hotspots were retrieved: in Asia the temperate grasslands, savannas and shrublands and the Altai-Sayan mountain range and in Europe in the Mediterranean shrublands (Figure 2e,f).Amaranthaceae, Asteraceae, Euphorbiaceae, Portulacaceae and Zygophyllaceae are the five eudicot families with the highest numbers of C 4 species (Figure 4a).While C 4 Amaranthaceae showed high species richness in many different biomes, the major biomes of C 4 eudicots were tropical and subtropical moist and dry broadleaf forests (Biome 1 & 2), tropical, subtropical and temperate grasslands, savannas and shrublands (Biome 7 & 8), and deserts and xeric shrublands (Biome 13).We found statistically significant differences in precipitation and temperature between C 4 eudicot families (Kruskal-Wallis: for annual mean precipitation -chi-squared = 231.34,df = 14, p-value <2.2e-16; for annual mean temperature -chi-squared = 161.73,df = 14, p-value <2.2e-16) and between C 4 eudicots and C 4 grasses (Mann-Whitney U test: for annual mean precipitation -p-value <2.2e-16;

Focusing
on the five families (Amaranthaceae, Asteraceae, Euphorbiaceae, Portulacaceae and Zygophyllaceae) considered as TA B L E 1 The impact of filtering on the raw occurrence datasets of the various eudicots plant families containing C 4 species (including the number of occurrence points at each step).Beginning with the raw record list.Nomenclatural and taxonomic checking including [the correction of] wrong synonymisations.d Deleting outliers: after checking whether the native distribution information of plant softh eworl donli ne.org matches the distribution country (Country Code) of each distribution point from GBIF.species-rich per grid in Figure 4, we noticed that Amaranthaceae differed significantly (Post-hoc Tukey test, p-value of <.01) in the annual mean temperature and precipitation range from the other four families, except when comparing Amaranthaceae and Zygophyllaceae for the precipitation range (Post-hoc Tukey test, pvalue of 1).Further multiple comparisons between the four other plant families showed no significant difference in terms of temperature (Post-hoc Tukey test, p-value of >.06).In terms of precipitation range, the comparisons between Zygophyllaceae-Asteraceae, Zygophyllaceae-Euphorbiaceae and Zygophyllaceae-Portulacaceae showed a significant difference (Post-hoc Tukey test, p-value of <.01).Amaranthaceae had the lowest mean value of 17.77°C of these five families.Furthermore, the mean value of annual precipitation in Amaranthaceae was 333 mm.The mean value of the preferred annual mean temperature for Asteraceae was 22.91°C, and annual precipitation of 929 mm, occurring in wetter and warmer areas compared to the other four families.C 4 species of Euphorbiaceae preferred rather wet areas (mean = 797 mm) with a wide interquartile range and temperatures that intersect with the preferred areas of Amaranthaceae and Asteraceae (mean = 20.62°C).A similar pattern was retrieved for C 4 species of the Portulacaceae.Their preferred temperature lied between the values of Amaranthaceae and Asteraceae (mean = 20.69°C) and the C 4 species of Portulacaceae occurred in areas with an annual precipitation of 878 mm.Zygophyllaceae together with Asteraceae preferred the warmest areas among these five families (Zygophyllaceae: mean = 22.71°C), and the mean value of the preferred annual precipitation for Zygophyllaceae was second lowest at 459 mm.Families with only one C 4 lineage (genus) stood out among the others.In Polygonaceae, a single C 4 genus Calligonum that occurs in the cold deserts of Central Asia was particularly conspicuous in climatic preferences with the lowest mean value of 11.48°C and a precipitation preference in the very dry range (mean = 136 mm).A different picture was observed in C 4 species of the family Caryophyllaceae which also contains only one C 4 genus, Polycarpaea.These C 4 species preferred comparatively warmer (mean = 26.72°C)and wetter regions (mean = 895 mm).
These are distributed around a mean value of annual precipitation of 444 mm, which is higher than the mean for all C 4 eudicots.Salt tolerance is documented in seven families with about 485 (49%) species, 333 (33%) of which are succulents from five F I G U R E 3 Scatterplot showing annual mean temperature (°C) and annual mean precipitation (mm) data of C 4 grasses (green) and C 4 eudicot (blue) occurrence points.families(Aizoaceae, Amaranthaceae, Polygonaceae, Portulacaceae, Zygophyllaceae).The biochemical subtypes were also examined.459 (46%) investigated species within 11 families were retrieved to have the NAD-malate enzyme as the predominant decarboxylase.In contrast, 559 (56%) species within ten families exhibited the NADP-ME biochemical subtypes.
The five C 4 eudicot families with the highest number of species in this study are shown (Figure 1a), as well as C 4 eudicots and C 4 grasses in general.Colours correspond to the biomes shown in (b).Saturation shows the percentage of species per family or per group (C 4 eudicots, C 4 grasses) in each biome.(b) Terrestrial biomes of the world (Olson et al., 2001).(c) Global richness map of C 4 eudicot families, showing the number of families occurring in each grid (100 × 100 km).F I G U R E 5 Differences in climatic preferences of the C 4 species in eudicot families, and C 4 eudicots and C 4 grasses in general.A Kruskal-Wallis test was used over C 4 eudicot families: p < .001statistically significant.(a) Annual mean precipitation (b) and annual mean temperature.spatiotemporal histories of C 4 lineages.The database attached to this paper (Table

F
I G U R E 6 (a) Distribution of Kranz types present in C 4 eudicots.(b) Distribution of the traits summarised within the C 4 eudicots.Separated into the number of C 4 species and the number of families in which the traits occur.(c) Schematic illustration of the four common C 4 leaf types (Schematic drawings for illustration of leaf anatomical types are adapted from Bohley et al., 2015).(d) Range and mean of annual precipitation of species with NADP-ME and NAD-ME as primary decarboxylating enzymes.TA B L E 2 List of eudicots C 4 genera occurring in the diversity hotspots Mexico/Southern United States and Australia according to our occurrence points and of C 4 genera with molecular evidence of C 4 in situ origin within the remaining diversity hotspots.Genera that are species-rich (>10 species) in the respective region are indicated in bold (x).Genera that likely originated in the respective region are marked with *.US (Euphorbia subg.Chamaesyce sect.Anisophyllum subsect.Hypericifoliae) Yang and Berry (2011), Horn et al.
) reveal two shared hotspots for C 4 grasses and C 4 eudicots: one in Mexico/ Southern United States and one in Australia.In these two regions, C 4 lineages seem to have diversified more intensively than in other parts of the world, and in case of the C 4 eudicots, this diversity has been recruited independently from multiple families and even multiple times within one family (Figures2e,f and 4c; Table2).Two additional hotspot regions for the C 4 grasses are found in South and West Africa.However, since Africa is in many areas poorly-sampled, these two regions might appear as C 4 grasses hotspots due to being proportionally more densely sampled than other regions in Africa.Additionally, looking at the richness of C 4 grasses per botanical country, East (Tanzania) and Central Africa also count as C 4 species-rich (Figure2b).The richness maps showing the diversity at the genus (Figure2e,f) and family level (Figure4c), however, indicate possible further smaller hotspots for the C 4 eudicots, in Africa (as in grasses), as well as in Patagonia, Central Asia and the Mediterranean.
. Here are three examples where this is the case: (i) C 4 grasslands in the highveld of southern Africa, dominated by Hyparrhenia hirta (L.) Stapf (Panicoideae) and Sporobolus pyramidalis P.Beauv.(Chloridoideae) ; (ii) species-rich C 4 grasslands at the tropical Sudanian savanna near the Volta-, Benue-and Niger-River experience a peak of summer precipitation of 600 mm in the north and 1000 mm in the south and the West African monsoons that occur between June and August result in warmer and wetter summers that support C 4 vegetation (Olusegun et al., 2018); (iii) species-rich C 4 grasslands in north-east Queensland and the Northern Territory, Australia, are associated with the tropical to subtropical climate along the coastal strip, the warm and wet summer months are dominated by woody (sub)shrubs Haloxylon persicum and H. ammodendron as well as Calligonum aphyllum, C. mongolicum and Anabasis brevifolia.All five C 4 species represent an integral part of the cold desert vegetation and no closely related C 3 relatives are known from ei- Both, C 4 grasses and C 4 eudicots, are fairly common even in (sub) tropical moist broadleaf forests.All these biomes, except the latter, are demarcated by scarcity and/or seasonality of precipitation.For example, one part of C 4 grasses occurs in the rainforests of the Australasian realm such as tropical and Central Range montane rainforests of Queensland.Another part occurs in the Indomalayan realm, such as Borneo lowland rainforests, Kayah-Karen and Sri Lanka montane rainforests and South Taiwan monsoon rainforests.
Did C 4 lineages originate in the diversity hotspots or did C 4 lineages colonise these areas?4.2.1 | C 4 hotspot Mexico/southern United States The C 4 hotspot in Mexico and the Southern United States encompasses mostly deserts and xeric shrublands with different climatic regimes.This southern tip of the Nearctic realm comprises warm deserts such as Mojave Desert, Sonoran Desert, Chihuahuan Desert as well as cold deserts of the Great Basin, and large adjacent and equally diverse semi-desert areas (Laity, 2008).This climate and habitat diversity probably promoted speciation, making these areas particularly species-rich.About two-thirds of the flora is endemic to this region and the most common plant families represented in this flora are Cactaceae, Asteraceae and Boraginaceae (Villarreal-Quintanilla et al., 2017).The majority of endemic species and also the most widely distributed species that typify these landscapes such as Ambrosia monogyra, Artemisia filifolia and Flourensia cernua (Asteraceae), Ephedra torreyana (Ephedraceae), Larrea tridentata (Zygophyllaceae), Penstemon thurberi (Plantaginaceae), Poliomintha incana (Lamiaceae), Prosopis glandulosa and Psorothamnus scoparius (Fabaceae) and Yucca elata (Asparagaceae, Shreve, 1939), however, do not perform C 4 photosynthesis.Nevertheless, here we recorded many C 4 species from 17 eudicot genera belonging to eight different families (Table

C 4
pathway evolved only once within the subgenus Chamaesyce at the stem of section Anisophyllum subsection Hypericifoliae during the Mid-Miocene and gave rise to approximately 350 C 4 species, which constitutes the largest eudicot C 4 lineage known thus far.For the well-known C 4 model Flaveria (Asteraceae; Monson & Moore, 1989; Sage et al., 2014) the phylogenetic tree topology suggests that the C 4 pathway originated in this area, likely during the Pliocene (Morales-Briones & Kadereit, 2023).The genus comprises 21 spe- C 4 representatives of Euphorbiaceae and Asteraceae rely solely on the NADP-ME pathway with the exception of Euphorbia mongolica, which can be found in highly seasonal temperate East Asia(Zang et al., 2021) and exhibits the NAD-ME pathway.Preliminary data suggest that C 4 photosynthesis evolved only once in Caryophyllaceae in the genus Polycarpaea somewhen in the Pliocene(Kool, 2012).Apart from one widely distributed C 4 species, all the other C 4 species are restricted predominantly to either subtropical forests and (semi)arid zones of Australia or to the western Africa and adjacent regions.The highest species diversity of the C 3 representatives is the more mesic Canary Islands.Low number of extremely xeric species might be explained by the presence of the NADP-ME mechanism, which seems less efficient under extremely arid conditions, at least in grasses(Rao & Dixon, 2016).In Boraginaceae, the C 4 photosynthesis arose probably at least twice independently and while genera Euploca and Heliotropium are distributed worldwide, the C 4 representatives of either genus are found predominantly in the seasonally dry and (semi)arid habitats around the globe.A similar pattern is observed also in C 4 representatives of Cleomaceae, Molluginaceae and Nyctaginaceae, in all three of which the diversity of C 4 species is markedly lower than that of C 3 species.
whose four species are restricted to this warm region and exhibit NAD-ME mechanism for carbon fixation as well.In Zygophyllaceae, the C 4 photosynthesis developed several times independently and majority of the representatives are found in hot deserts around the world(Lauterbach et al., 2019).Interestingly, apart from one species (Tetraena (=Zygophyllum) simplex) all C 4 species of Tribuloideae seem to possess the NADP-ME subtype.While it remains unclear why this is the case, Tetraena simplex is also the only species whose C 4 anatomy resembles the kochioid Kranz type, while the rest of the taxa show atriplicoid Kranz type.This is not the case in other taxa, where NADP-ME and NAD-ME are not restricted to a particular Kranz anatomy.C 4 representatives of Acanthaceae and Gisekiaceae have a comparable distribution area and are found across seasonally dry to arid habitats of (sub)tropical Africa and southwestern Asia.In both families the C 4 syndrome arose probably only once(Bissinger et al., 2014;Fisher et al., 2015).While Blepharis (Acanthaceae) includes C 3 representatives as well, and both NAD-ME and NADP-ME pathways are known from its C 4 representatives, in Gisekia (monogeneric Gisekiaceae) all currently known species perform NAD-ME-based C 4 photosynthesis.Despite this restriction, both taxa are found along a wide range of open and often disturbed arid to mesic habitats(Bissinger et al., 2014).5 | CON CLUS ION/OUTLOOKAlthough the climatic ranges of C 4 grasses and C 4 eudicots mostly overlap, our findings suggest that C 4 eudicots tend to inhabit substantially drier regions than their C 4 monocot counterparts.This could be due to the numerous phylogenetically less constrained morphological, ecological and physiological adaptations to harsh environments present in C 4 eudicots.C 4 eudicot lineages employing the NAD-ME decarboxylating enzyme inhabit notably drier regions in comparison to those utilising the NADP-ME decarboxylating enzyme.We conclude that C 4 evolution in most eudicot lineages occurred in ancestral drought-adapted clades, thereby facilitating the expansion of such plants in these habitats and allowing them to colonise even drier areas.We identified primary hotspots of C 4 eudicots that are corroborated by phylogenetic studies as source and sink areas of C 4 diversity, respectively: the arid regions in Mexico/ Southern United States, Australia and Central Asia, where multiple C 4 eudicot lineages diversified independently possibly due to increasingly drier environmental conditions.Our literature review on evolutionary histories of individual taxa in these regions, albeit scarce, indicates that evolutionary history of C 4 elements in these regions differs greatly.Mexico and the Southern United States exhibit a high number of C 4 lineages that originated in situ, whereas high C 4 diversity in Australia and Central Asian deserts results primarily from the secondary migration of C 4 lineages into these areas.

Table 2
recognised, both under strong influence of arid desert or steppe climate with a pronounced dry period.These two centres of C 4 diversity are most evident in Euploca (Boraginaceae) and Portulaca (Portulacaceae).However, they are also found in other eudicotyledons such as Amaranthaceae, Asteraceae and Euphorbiaceae.Portulaca has the highest C 4 species diversity in north-eastern Brazil.C 4 Portulaca clade started to radiate in the late Miocene/